Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "is None" constraints from if statements during inference #1189

Merged
merged 27 commits into from
Jan 6, 2023

Conversation

david-yz-liu
Copy link
Contributor

@david-yz-liu david-yz-liu commented Sep 24, 2021

Steps

  • For new features or bug fixes, add a ChangeLog entry describing what your PR does.
  • Write a good description on what the PR does.

Description

Alright, this is a big one so please don't hold back. This PR adds infrastructure to support constraints on values during inference. Here's a simple example:

from astroid import extract_node

node = extract_node("""
def f(b):
    if b:
        x = 1
    else:
        x = None
    
    if x is not None:
        x #@
""")
print(node.inferred())

This prints [<Const.int l.4 at 0x14232a6e6d0>, <Const.NoneType l.6 at 0x142329b38b0>], using both possible values from the assignment statements. Of course, because of the x is not None if condition, only the int is possible, but astroid currently doesn't take this condition into account.

This PR changes how inference works so that only [<Const.int l.4 at 0x14232a6e6d0>] is printed.

Scope

I reviewed the existing pylint issues related to this (they are generally labelled as "topic-control-flow"). There were five types of constraints that were brought up:

  1. x is None/x is not None
  2. x/not x (truthy/falsey value)
  3. isinstance(x, ...)
  4. x == ...
  5. Combinations of the above with and/or

There were also six primary locations where these constraints could be inferred (orthogonal to the constraints themselves):

  1. if statement (as in above example)

  2. and/or clauses. For example, x is not None and x[0] == 1; the second x should not be inferred as None

  3. ternary if expressions. For example, x[0] if x is not None else 1; the first x should not be inferred as None

  4. reassignment within an if statement, e.g.

    def f():
        a = None
        if a is None:
            a = 1
        return -a
  5. early return/raises within an if statement.

  6. assert statements

This pull request addresses just the item 1's in each list. I used an approach that can handle 1-5 of the constraint types, and 1-3 of the locations. If this approach is approved, I think it would be pretty straightforward to make additional PRs for these items. (More on the other constraint locations below.)

Implementation overview

I added a new Constraint class to represent a constraint on a variable, like x is not None.

  • This is an abstract class, and can be subclassed for different constraint types.

When a variable is inferred (see inference.py, infer_name), we search the ancestor chain from the node and look for containing if statements, using these to accumulate constraints on the variable. These constraints are added to the inference context (added a new attribute to InferenceContext).

Then, in _infer_stmts (bases.py), we filter the inferred values based on whether or not they satisfy all constraints for that variable.

  • Uninferable is considered to satisfy all constraints.
  • ⚠️ If there is at least one inferred value, but all inferred values are filtered out, a new Uninferable value is yielded, rather than yielding nothing at all. I think this is the conservative approach, to avoid errors from the raise_if_nothing_inferred decorator.

The above also applies to attributes (see infer_attribute).

Some comments/questions

  1. Because constraints are gathered by searching up the ancestor chain, this PR doesn't handle real control flow issues, e.g. with the reassignment example from above.
  2. Because the filtering occurs in _infer_stmts, it only applies to variables and attributes. So for example, subscript nodes (e.g. my_array[0]) won't have their inferred values filtered.
  3. Gathering constraints occurs every time infer_name/infer_attribute is called, and since this requires going up the ancestor tree, this adds some overhead to inference. In principle constraints can be gathered through an AST visitor, but I thought that would be more complex so didn't choose it. (Thoughts welcome of course!)
  4. I think adding the constraints to InferenceContext is correct, but to be honest I'm fuzzy on the nuances of this class and how it's used.
  5. Do you like the name "constraint" here? I go back on forth on it, but haven't come up with anything better.

Type of Changes

Type
✨ New feature

Related Issue

Closes with test #791
Closes with test pylint-dev/pylint#157
Closes with test pylint-dev/pylint#1472
Closes with test pylint-dev/pylint#2016
Closes with test pylint-dev/pylint#2631
Closes with test pylint-dev/pylint#2880

@cdce8p cdce8p self-requested a review September 25, 2021 10:18
@Pierre-Sassoulas Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label Sep 26, 2021
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.9.0 milestone Sep 26, 2021
@Pierre-Sassoulas Pierre-Sassoulas self-requested a review September 26, 2021 07:27
@DanielNoord
Copy link
Collaborator

Hi @david-yz-liu, thanks for all of the work here. I think the size (and importance) of this PR has put off people from reviewing this but I think we should try and start the process of getting this merged.

The first question I would have before doing a full review is how we could expand this to include conditional reassignments. I think it would be every important to support the following code pattern:

def func():
    if True:
        to_infer = 1
    else:
        to_infer = 2

    print(to_infer)

This does not need to be added in this PR, but we should at least have a general idea of how we could implement this within the same system. Do/did you have any ideas about this yourself?

@DanielNoord DanielNoord modified the milestones: 2.10.0, 2.11.0 Feb 26, 2022
@david-yz-liu
Copy link
Contributor Author

@DanielNoord wow, thank you! So sorry for the delay, I think I saw your message and meant to reply months ago but then... forgot. My bad!

Your question is a great one. It's been a while since I've thought about this, but I'll try to give a decent answer now (and perhaps a more thorough one later). I believe my current approach piggybacks off of the way InferenceContexts are passed around when doing inference, so the general rule for what this PR can or can't do is "we can accumulate Contraints whenever we can accumulate other things in InferenceContexts". The examples I gave in the PR description are places where a context is passed from a parent down to a child node in the AST, so they're pretty straightforward.

However as far I as can remember, existing InferenceContexts don't get passed across statements in a block. I think this is fundamentally why pylint's VariablesChecker is as complex as it is?

But anyway to answer your question, I think that this PR can be extended to handle the example you provided, but it wouldn't be by adding to my code very much, but rather by changing how these InferenceContexts (or some other "context" state) is passed around, using techniques similar to what's happening in the VariablesChecker. Or, using control flow graphs (😅) to precisely model the possible execution paths, and then accumulating Contraints across paths in the graph.

Please let me know if that answers your question, or if there are any follow-ups! Happy to be of service :)

@DanielNoord
Copy link
Collaborator

@DanielNoord wow, thank you! So sorry for the delay, I think I saw your message and meant to reply months ago but then... forgot. My bad!

Your question is a great one. It's been a while since I've thought about this, but I'll try to give a decent answer now (and perhaps a more thorough one later). I believe my current approach piggybacks off of the way InferenceContexts are passed around when doing inference, so the general rule for what this PR can or can't do is "we can accumulate Contraints whenever we can accumulate other things in InferenceContexts". The examples I gave in the PR description are places where a context is passed from a parent down to a child node in the AST, so they're pretty straightforward.

However as far I as can remember, existing InferenceContexts don't get passed across statements in a block. I think this is fundamentally why pylint's VariablesChecker is as complex as it is?

But anyway to answer your question, I think that this PR can be extended to handle the example you provided, but it wouldn't be by adding to my code very much, but rather by changing how these InferenceContexts (or some other "context" state) is passed around, using techniques similar to what's happening in the VariablesChecker. Or, using control flow graphs (😅) to precisely model the possible execution paths, and then accumulating Contraints across paths in the graph.

Please let me know if that answers your question, or if there are any follow-ups! Happy to be of service :)

I think that answers it pretty clearly. As you can see I spent some time familiarising myself with the code 😄 I think this could indeed be a very good first step towards some better control flow. I have been thinking about doing something with frames and line numbering to do some more difficult constraints, but at the moment I don't see how we couldn't change the current approach to do that. It all seems flexible enough.

Let me know what you think of my changes. I made the tests pass on 3.7 and made some of the typing a little more strict. I appreciate that you had the constraints allow for other nodes than If, but for ease of later expansion I think it makes sense to keep it tighter now and expand the typing if needed later on.

@david-yz-liu
Copy link
Contributor Author

Let me know what you think of my changes. I made the tests pass on 3.7 and made some of the typing a little more strict. I appreciate that you had the constraints allow for other nodes than If, but for ease of later expansion I think it makes sense to keep it tighter now and expand the typing if needed later on.

Will do. I'll try to look at this more closely tonight or tomorrow. 👍

ChangeLog Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Mar 22, 2022

Pull Request Test Coverage Report for Build 3586640247

  • 92 of 92 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 92.365%

Totals Coverage Status
Change from base Build 3586270619: 0.06%
Covered Lines: 9980
Relevant Lines: 10805

💛 - Coveralls

@david-yz-liu
Copy link
Contributor Author

@DanielNoord Okay I reviewed all of your changes, they look good to me! Yeah I was going for more general types but happy to stay strict for now and expand later if needed. Also thanks for making that Python 3.7 fix, that would have taken me a while to track down 😅.

Anyway as far as I can tell, this is good from my end, just waiting for more detailed review/feedback.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The tests in particular are diversified and very meticulous. This will close a ton of issues and open a lot of possibilities in pylint too.

astroid/constraint.py Outdated Show resolved Hide resolved
astroid/constraint.py Show resolved Hide resolved
astroid/constraint.py Outdated Show resolved Hide resolved
astroid/constraint.py Outdated Show resolved Hide resolved
@DanielNoord
Copy link
Collaborator

@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!

Copy link
Member

@cdce8p cdce8p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! I apologize for the delay again in making these updates (just have had a lot of other things going on).

Not necessary 🙂 We all do.

Notably, I've moved the constraint code back into a single file (constraint.py) and removed the all and instead used the underscore prefix where appropriate.

👍🏻

--
Left some more style comments.

Comment on lines 26 to 29
@classmethod
@abstractmethod
def match(
cls: _ConstraintT, node: _NameNodes, expr: nodes.NodeNG, negate: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a classmethod, the type of the first argument should be a class type not an instance.

Suggested change
@classmethod
@abstractmethod
def match(
cls: _ConstraintT, node: _NameNodes, expr: nodes.NodeNG, negate: bool = False
@classmethod
@abstractmethod
def match(
cls: type[_ConstraintT], node: _NameNodes, expr: nodes.NodeNG, negate: bool = False

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to be Self which I think makes more sense here, let me know if I need to revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self would be awesome. Unfortunately it isn't supported by mypy yet.
The PR has already been merged, but we need to wait for the next release (1.0.0).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because it was already merged I thought it would be okay to use. pyright understands it and I don't think we will turn on mypy in astroid before 1.0.0 gets released so this doesn't give us any additional work (instead it prevents some 😄 )

Comment on lines 47 to 49
@classmethod
def match(
cls: _ConstraintT, node: _NameNodes, expr: nodes.NodeNG, negate: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@classmethod
def match(
cls: _ConstraintT, node: _NameNodes, expr: nodes.NodeNG, negate: bool = False
@classmethod
def match(
cls: type[_ConstraintT], node: _NameNodes, expr: nodes.NodeNG, negate: bool = False

Same here

astroid/constraint.py Outdated Show resolved Hide resolved
astroid/constraint.py Outdated Show resolved Hide resolved
Comment on lines 61 to 62
and _matches(right, cls.CONST_NONE)
or _matches(left, cls.CONST_NONE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls.CONST_NONE is a type error. See: #1189 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be fixed with using Self?

astroid/constraint.py Outdated Show resolved Hide resolved
astroid/constraint.py Outdated Show resolved Hide resolved
astroid/constraint.py Outdated Show resolved Hide resolved
Comment on lines 130 to 133
for constraint_cls in ALL_CONSTRAINT_CLASSES:
constraint = constraint_cls.match(node, expr, invert)
if constraint:
return constraint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case in which multiple constraint classes could match (in the future)? Does it make sense to change it to an Iterator then, i.e. yield constraint instead of return?

astroid/bases.py Show resolved Hide resolved
@DanielNoord DanielNoord requested a review from cdce8p November 20, 2022 23:14
@DanielNoord
Copy link
Collaborator

I handled the review by Marc and incorporate his comments to help move this PR along. I think we're all eager to get this merged 😄

@david-yz-liu
Copy link
Contributor Author

Thanks @DanielNoord and @cdce8p! I wasn't sure exactly when I'd have time to look at this again so very much appreciate the updates. 👍

BTW on the question of whether multiple constraint classes may match a given node (in the future), I think... maybe but not necessarily. For example, if we imagine a hypotheical "AndConstraint" from

if x is not None and isinstance(x, int):
    ... x ...

and so the if node's test is an and of two expressions, I could see either yielding two distinct constraints ("not none constraint" and "is int instance" constraint), or a single AndConstraint whose children are the two constraints. (Personally I'd prefer going with the latter approach.)

@DanielNoord
Copy link
Collaborator

BTW on the question of whether multiple constraint classes may match a given node (in the future), I think... maybe but not necessarily. For example, if we imagine a hypotheical "AndConstraint" from

if x is not None and isinstance(x, int):
    ... x ...

and so the if node's test is an and of two expressions, I could see either yielding two distinct constraints ("not none constraint" and "is int instance" constraint), or a single AndConstraint whose children are the two constraints. (Personally I'd prefer going with the latter approach.)

I think the concept of AndConstraint makes sense. This would remove the need for the yield behaviour.

I implemented the yielding behaviour for now but that could easily be reverted. @cdce8p what do you think?

@codecov
Copy link

codecov bot commented Jan 6, 2023

Codecov Report

Merging #1189 (46c056f) into main (f476ebc) will increase coverage by 0.05%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1189      +/-   ##
==========================================
+ Coverage   92.58%   92.63%   +0.05%     
==========================================
  Files          93       94       +1     
  Lines       10782    10869      +87     
==========================================
+ Hits         9982    10069      +87     
  Misses        800      800              
Flag Coverage Δ
linux 92.40% <100.00%> (+0.06%) ⬆️
pypy 88.54% <96.73%> (+0.05%) ⬆️
windows 92.31% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
astroid/bases.py 87.98% <100.00%> (+0.38%) ⬆️
astroid/constraint.py 100.00% <100.00%> (ø)
astroid/context.py 97.33% <100.00%> (+0.11%) ⬆️
astroid/inference.py 94.54% <100.00%> (+0.03%) ⬆️

@Pierre-Sassoulas Pierre-Sassoulas dismissed cdce8p’s stale review January 6, 2023 19:51

Everything was taken into account.

@Pierre-Sassoulas Pierre-Sassoulas merged commit 21880dd into pylint-dev:main Jan 6, 2023
@jacobtylerwalls
Copy link
Member

Thanks to all for reviews and persistence--huge step forward!

@david-yz-liu
Copy link
Contributor Author

💯 @Pierre-Sassoulas thank you! And thank you all for the efforts in reviewing this and keeping the branch up to date. I'm super excited to see how this evolves!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Notice is not None and similar checks and prevent BadUnaryOperationMessage
6 participants